In this section we describe how to create acute inpatient encounters (i.e. hospital stays) from medical claims data. At the highest level this involves two steps:
Identifying claims that occurred during an acute inpatient hospital stay
Merging these claims into a single encounter
We go into significant detail about how this is done in the Tuva data model and how to use data tables in the data model to identify possible data quality problems and perform analytics.
First we need to connect to our database. We do this with the following connection.
Code
%%captureimport snowflake.connectorimport pandas as pd# Connect to Snowflake with SSOconn = snowflake.connector.connect( user="aaron@tuvahealth.com", account="ksa27360.us-east-1", authenticator="externalbrowser", warehouse="COMPUTE_WH", database="medicare_lds_five_percent", schema="PUBLIC", role="accountadmin")
9.1 Identifying Acute Inpatient Claims
The first step in building acute inpatient encounters is identifying claims that occurred during acute inpatient hospital stays. We need to do this for both institutional and professional claims.
The following fields are commonly used to identify acute inpatient institutional claims: - Bill type code in (11X, 12X) - Any valid MS- or APR-DRG - Room and board revenue center codes
Bill type codes equal to 11X or 12X should only be found on acute inpatient claims. The same is true for any valid MS- or APR-DRGs. However, room and board revenue center codes can be found on a wide variety of inpatient claims (e.g. SNF, inpatient rehab, etc.).
Given these fields are essential for this analysis, we need to quickly assess whether they have any data quality problems. For bill type and DRG fields we need to check the following issues: - Missing: Every claim line should have a value populated - Invalid: Every value should exist in the official terminology table - Duplicate: Every claim should have one and only one value
By comparison, revenue center codes should meet the missing and invalid criteria, however, each institutional claim can and typically does have more than one value.
The data quality summary in acute inpatient can quickly reveal whether these fields have any of these problems. Let’s query this table and take a look.
Code
%%capture# Query the acute IP data quality summary table and print a dataframe with the relevent rows.query ="select * from input_layer.medical_claim limit 100000"df = pd.read_sql(query, conn)
Code
import pandas as pdimport plotly.express as px# Normalize column names to upper casedf.columns = df.columns.str.upper()# Ensure CLAIM_START_DATE is a datetime fielddf['CLAIM_START_DATE'] = pd.to_datetime(df['CLAIM_START_DATE'])# Extract month and year for groupingdf['YEAR'] = df['CLAIM_START_DATE'].dt.yeardf['MONTH'] = df['CLAIM_START_DATE'].dt.strftime('%b') # Short month namedf['YEAR_MONTH'] = df['CLAIM_START_DATE'].dt.to_period('M')# Group by month-year and count unique claimsmonthly_claims = df.groupby(['YEAR_MONTH', 'YEAR', 'MONTH'])['CLAIM_ID'].nunique().reset_index()monthly_claims['YEAR_MONTH'] = monthly_claims['YEAR_MONTH'].dt.to_timestamp()# Create the trend chart with Plotlyfig = px.line( monthly_claims, x='YEAR_MONTH', y='CLAIM_ID', title='Monthly Claim Volume Trend', labels={'CLAIM_ID': 'Number of Unique Claims'}, markers=True)# Customize the layoutfig.update_layout( title_font_size=20, xaxis_title='', yaxis_title='Number of Unique Claims', xaxis_tickangle=0, template='plotly_white',)# Update x-axis ticks to display months and yearsfig.update_xaxes( tickformat='%b', # Show abbreviated month names dtick='M1', # Set ticks for every month ticklabelmode='period', # Period-aligned ticks showgrid=False,)# Display the chartfig.show()